NPU Enabling and Usage
Introduction
The Dragonwing 9075 EVK has a dedicated NPU that delivers up to 100 Dense TOPS of performance that runs 13Bn parameter models and generates 12 tokens per second.
On Yocto, the layer recipes-ml provides recipes for some Qualcomm AI runtime SDK components. On Ubuntu, the SDK is partially included to be able to run some sample applications and some GStreamer pipelines.
| Specification | Value |
|---|---|
| NPU name | Qualcomm Hexagon[1] |
| NPU architecture | Dual Hexagon Tensor Processors[2] |
| Compute extensions | Vector and matrix extensions[3] |
| Vector accelerator | Quad Qualcomm Hexagon Vector eXtensions (HVX)[4] |
| Matrix accelerator | Dual Qualcomm Hexagon Matrix eXtensions (HMX) coprocessors[4] |
| Integrated DSP | Qualcomm Hexagon DSP[4] |
| Peak NPU performance, QCS9075-AC | Up to 50 dense TOPS[1] |
| Peak NPU performance, QCS9075-AA | Up to 100 dense TOPS[1] |
| Peak sparse-equivalent performance | Up to 200 equivalent sparse TOPS[2] |
| INT8 AI performance | Up to 100 INT8 TOPS[3] |
| Example generative AI workload | Llama 2 7B at up to 22 tokens/s[1] |
| NPU software backend | QNN HTP backend / Hexagon Tensor Processor backend[5] |
| Quantized network support | Quantized 8-bit and quantized 16-bit networks[5] |
| Floating-point support | Float32 networks using float16 math on select Qualcomm SoCs[5] |
| Operator / layer support source | QAIRT / QNN Supported Operations, HTP backend columns[6] |
The full QAIRT SDK supports the following operations:
| Supported layers | Layer type | Datatype | Backend |
|---|---|---|---|
| Conv1d, Conv2d, Conv3d, DepthWiseConv1d, DepthWiseConv2d | Convolution | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| FullyConnected, MatMul | Dense / matrix multiplication | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| PoolAvg2d, PoolAvg3d, PoolMax2d, PoolMax3d, L2Pool2d | Pooling | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| Relu, Prelu, Elu, Gelu, HardSwish, Sigmoid, Tanh, Softplus | Activation | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| ElementWiseAdd, ElementWiseSubtract, ElementWiseMultiply, ElementWiseDivide, ElementWisePower, ElementWiseMaximum, ElementWiseMinimum, ElementWiseSquaredDifference | Element-wise arithmetic | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| ElementWiseAbs, ElementWiseCeil, ElementWiseCos, ElementWiseExp, ElementWiseFloor, ElementWiseLog, ElementWiseNeg, ElementWiseRound, ElementWiseRsqrt, ElementWiseSin, ElementWiseSquareRoot | Element-wise unary | FP32 / FP16 | CPU, HTP, HTP FP16, GPU, LPAI |
| ElementWiseEqual, ElementWiseNotEqual, ElementWiseGreater, ElementWiseGreaterEqual, ElementWiseLess, ElementWiseLessEqual | Element-wise comparison | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| ElementWiseAnd, ElementWiseOr, ElementWiseXor, ElementWiseNot | Element-wise logical | Boolean / integer | CPU, HTP, HTP FP16, GPU, LPAI |
| Batchnorm, InstanceNorm, LayerNorm, GroupNorm, L2Norm, Lrn | Normalization | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| ReduceMax, ReduceMean, ReduceMin, ReduceProd, ReduceSum, ReduceSumSquare | Reduction | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| Softmax, LogSoftmax, MaskedSoftmax | Softmax | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| Reshape, Squeeze, ExpandDims, Transpose, Permute, Pack, Unpack, Concat, Split, Slice | Tensor shape / layout | Input datatype | CPU, HTP, HTP FP16, GPU, LPAI |
| Pad, Tile, Gather, GatherElements, GatherNd, OneHot, NonZero | Tensor indexing / construction | FP32 / FP16 / integer / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| Quantize, Dequantize, Cast, Convert | Datatype conversion | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| Resize, CropAndResize, GridSample, ExtractPatches, DepthToSpace, SpaceToDepth, BatchToSpace | Vision / spatial transform | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| Argmax, Argmin, TopK | Selection / ranking | FP32 / FP16 / integer / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| Lstm, Gru | Recurrent | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
| NonMaxSuppression, MultiClassNms, CombinedNms, BoxWithNmsLimit, DetectionOutput, GenerateProposals, CollectRpnProposals, DistributeFpnProposals, BboxTransform | Detection / proposal generation | FP32 / FP16 / INT8 / INT16 | CPU, HTP, HTP FP16, GPU, LPAI |
GStreamer elements for NPU use
Gstreamer elements typically used for AI applications
| Element | Type | Description |
|---|---|---|
| qtimlqnn | QNN inference | Qualcomm QNN-based inference element. This is the most direct GStreamer candidate for QNN/HTP/NPU execution; set the backend from the default CPU library to the HTP backend when validating NPU. |
| qtimltflite | TensorFlow Lite inference | Runs TFLite models. For NPU testing, use delegate=external with the QNN TFLite delegate and HTP backend options. |
| qtimlsnpe | SNPE inference | Runs SNPE/DLC models with delegate options for CPU, DSP, GPU, or AIP. Useful for legacy Qualcomm AI pipelines, but QNN/TFLite paths are more relevant for HTP/NPU validation. |
| qtimlvconverter | Video-to-tensor preprocessing | Converts video/x-raw frames into neural-network/tensors before inference. Used before qtimlqnn, qtimltflite, or qtimlsnpe in video AI pipelines. |
| qtimlaconverter | Audio-to-tensor preprocessing | Converts mono raw audio into neural-network/tensors. Supports raw, spectrogram, MFE, LMFE, and MFCC features for audio ML pipelines. |
| qtimlpostprocess | Generic ML postprocessor | Preferred post-processing element for converting inference tensors into video, text, or tensor outputs. Supports modules for detection, classification, segmentation, pose, OCR, depth, face, audio, and related AI tasks. |
| qtimldemux | Tensor demuxer | Splits batched neural-network/tensors into separate tensor streams. Useful after batched inference or when separating multiple tensor outputs. |
| qtibatch | Batch muxer | Batches buffers from multiple streams into one output buffer. Useful when testing batched or multi-stream inference. |
| qtimlmetaextractor | ML metadata extractor | Extracts ML metadata from video buffers into UTF-8 text buffers for logging, debugging, or publishing inference results. |
| qtimlmetaparser | ML metadata parser | Parses ML metadata from video or text buffers. Useful when converting inference metadata into a structured text representation such as JSON. |
| qtimetamux | Metadata muxer | Attaches text or optical-flow metadata as GstMeta to raw video/audio buffers, allowing inference results to travel with the media stream. |
| qtimetatransform | Metadata transform | Filters or transforms metadata attached to video buffers. Useful for ROI-based AI flows or smoothing label/ROI metadata. |
| qtiobjtracker | Object tracker | Tracks detected objects across frames after detection post-processing. Useful after object detection inference to maintain object IDs over time. |
Running AI sample applications [7]
Qualcomm offers some AI Sample Applications for object detection and parallel inferencing from input sources such as a camera, a video file or an RTSP stream to stream on the Dragonwing IQ-9075 device. To run the application use the following workflow:
- Download models and labels
- Transfers the downloaded files to the device
- Run AI sample applications
Download and transfer AI models and labels
The required models can be downloaded from Qualcomm AI Hub. This are the required files for some example applications:
| Sample application | Models required |
|---|---|
| AI object detection | yolox_quantized.tflite |
| Parallel AI inference | yolox_quantized.tflite |
| Inception-v3 | |
| HRNetPose | |
| DeepLabV3-Plus-MobileNet | |
| Multistream inference | yolox_quantized.tflite |
| Inception-v3 |
To download with automated script, create working directory on board:
WORKING_DIR=~/AI_Examples mkdir $WORKING_DIR cd $WORKING_DIR
sudo apt install unzip
Get script:
curl -L -O https://raw.githubusercontent.com/quic/sample-apps-for-qualcomm-linux/refs/heads/main/qualcomm-linux/scripts/download_artifacts.sh
Give executable permission:
chmod +x download_artifacts.sh
Execute script:
sudo ./download_artifacts.sh
gst-ai-object-detection application
To setup, setup the configuration file created in /etc/configs/config_detection.json
sudo vim /etc/configs/config_detection.json
To run with video example as source, change the file as follows
{
"file-path": "/etc/media/video.mp4",
"ml-framework": "tflite",
"yolo-model-type": "yolox",
"model": "/etc/models/yolox_quantized.tflite",
"labels": "/etc/labels/yolox.json",
"threshold": 40,
"runtime": "dsp",
"output-type": "waylandsink"
}
To run with camera source:
{
"camera": 0,
"ml-framework": "tflite",
"yolo-model-type": "yolox",
"model": "/etc/models/yolox_quantized.tflite",
"labels": "/etc/labels/yolox.json",
"threshold": 40,
"runtime": "dsp",
"output-type": "waylandsink"
}
The following table lists and describes the fields in the config_detection.json file.
| Field | Values/description |
|---|---|
ml-framework
|
Use one of the following models:
|
yolo-model-type
|
Runs the yolov5, yolov8, yolox and yolonas models, respectively. For more information about models and labels, see the Sample model and label files.
|
runtime
|
Use one of the following runtimes:
|
| Input source |
Use one of the following input sources:
|
output-ip-address
|
Output server IP address |
port
|
Output server port |
output-type
|
Use one of the following output types:
|
| USB camera video-format and resolution |
Use one of the following video formats:
Use one of the following resolution fields:
|
output-file
|
Output filename. The default filename is output_detection.mp4.
|
Run the app
gst-ai-object-detection

Object Detection with gst-lauch-1.0
Object detection using camera and udp sink
The download_artifacts.sh script downloads the models in /etc/models/
On board:
HOST_IP=X.X.X.X PORT=5000
gst-launch-1.0 -e -v \
qtiqmmfsrc camera=0 name=camsrc \
camsrc. ! queue ! \
'video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1' ! \
qtivcomposer name=mixer \
sink_0::dimensions="<1280,720>" \
sink_0::position="<0,0>" \
sink_0::zorder=0 ! \
queue ! \
'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
v4l2h264enc \
capture-io-mode=dmabuf \
output-io-mode=dmabuf-import ! \
h264parse config-interval=1 ! \
rtph264pay pt=96 config-interval=1 ! \
udpsink \
host=$HOST_IP \
port=$PORT \
sync=false \
async=false \
camsrc. ! queue ! \
'video/x-raw,format=NV12,width=640,height=360,framerate=30/1' ! \
qtimlvconverter ! \
qtimltflite \
model=/etc/models/yolox_quantized.tflite \
delegate=external \
external-delegate-path=libQnnTFLiteDelegate.so \
external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
qtimlpostprocess \
module=yolov8 \
labels=/etc/labels/yolox.json \
results=10 \
settings='{"confidence": 40.0}' ! \
'video/x-raw,format=BGRA,width=640,height=360' ! \
queue ! \
mixer.
On Host PC:
PORT=5000 gst-launch-1.0 -v udpsrc port=$PORT caps='application/x-rtp,media=video,encoding-name=H264,payload=96,clock-rate=90000' ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! autovideosink sync=false
Object detection using filesrc and udp sink
On board:
HOST_IP=X.X.X.X PORT=5000
gst-launch-1.0 -e -v \
filesrc location=/etc/media/video.mp4 ! \
qtdemux ! \
h264parse ! \
decodebin ! \
identity sync=true ! \
queue ! \
qtivtransform ! \
'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
tee name=split \
split. ! queue ! \
qtivcomposer name=mixer \
sink_0::dimensions="<1280,720>" \
sink_0::position="<0,0>" \
sink_0::zorder=0 ! \
queue ! \
'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
v4l2h264enc \
capture-io-mode=dmabuf \
output-io-mode=dmabuf-import ! \
h264parse config-interval=1 ! \
rtph264pay pt=96 config-interval=1 ! \
udpsink \
host=$HOST_IP \
port=$PORT \
sync=false \
async=false \
split. ! queue ! \
qtivtransform ! \
'video/x-raw,format=NV12,width=640,height=360,framerate=30/1' ! \
qtimlvconverter ! \
qtimltflite \
model=/etc/models/yolox_quantized.tflite \
delegate=external \
external-delegate-path=libQnnTFLiteDelegate.so \
external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
qtimlpostprocess \
module=yolov8 \
labels=/etc/labels/yolox.json \
results=10 \
settings='{"confidence": 40.0}' ! \
'video/x-raw,format=BGRA,width=640,height=360' ! \
queue ! \
mixer.
On Host PC:
PORT=5000 gst-launch-1.0 -v udpsrc port=$PORT caps='application/x-rtp,media=video,encoding-name=H264,payload=96,clock-rate=90000' ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! autovideosink sync=false
Object detection using camera and filesink
OUT=ai_object_detection_video.mp4
gst-launch-1.0 -e -v \
qtiqmmfsrc camera=1 name=camsrc \
camsrc. ! queue ! \
'video/x-raw,format=NV12_Q08C,width=1280,height=720,framerate=30/1' ! \
qtivcomposer name=mixer \
sink_0::dimensions="<1280,720>" \
sink_0::position="<0,0>" \
sink_0::zorder=0 ! \
queue ! \
'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
v4l2h264enc \
capture-io-mode=dmabuf \
output-io-mode=dmabuf-import ! \
h264parse config-interval=1 ! \
mp4mux ! \
filesink location=$OUT \
camsrc. ! queue ! \
'video/x-raw,format=NV12,width=640,height=360,framerate=30/1' ! \
qtimlvconverter ! \
qtimltflite \
model=/etc/models/yolox_quantized.tflite \
delegate=external \
external-delegate-path=libQnnTFLiteDelegate.so \
external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
qtimlpostprocess \
module=yolov8 \
labels=/etc/labels/yolox.json \
results=10 \
settings='{"confidence": 40.0}' ! \
'video/x-raw,format=BGRA,width=640,height=360' ! \
queue ! \
mixer.

Checking NPU use
You can DEBUG gstreamer pipeline further with:
export GST_DEBUG="qtimltflite:6,qtimlvconverter:4,qtimlpostprocess:4,*qnn*:6,*Qnn*:6" export TFLITE_MINIMAL_LOG_LEVEL=0 export ADSP_LIBRARY_PATH="/usr/lib/rfsa/adsp:/usr/lib:/usr/lib/aarch64-linux-gnu"
To make sure the NPU is being used as backend, run a pipeline with different backends: HTP, CPU and GPU
Set the runtime paths
export LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH export ADSP_LIBRARY_PATH=/usr/lib/rfsa/adsp:/usr/lib/rfsa/adsp/hexagon-v73:/usr/lib
HTP/NPU benchmark
Run this pipeline:
MODEL=/etc/models/yolox_quantized.tflite
LABELS=/etc/labels/yolox.json
DELEGATE=/usr/lib/libQnnTFLiteDelegate.so
GST_DEBUG=2 gst-launch-1.0 -e -v \
qtiqmmfsrc camera=1 ! \
'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
qtimlvconverter ! \
qtimltflite \
model=$MODEL \
delegate=external \
external-delegate-path=$DELEGATE \
external-delegate-options="QNNExternalDelegate,backend_type=htp" ! \
qtimlpostprocess \
module=yolov8 \
labels=$LABELS \
results=10 \
settings='{"confidence": 40.0}' ! \
fpsdisplaysink video-sink=fakesink text-overlay=false sync=false
After 10 seconds got this results:
... /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 234, dropped: 0, current: 30.18, average: 30.03 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 249, dropped: 0, current: 29.76, average: 30.01 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 265, dropped: 0, current: 30.17, average: 30.02 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 280, dropped: 0, current: 29.96, average: 30.02 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 295, dropped: 0, current: 29.91, average: 30.02
CPU using top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2800 ubuntu 20 0 2210588 281000 137748 S 7.9 0.8 0:03.47 gst-launch-1.0
Information when adding perf element before fpsdisplaysink:
perf: perf0; timestamp: 0:04:01.269363368; bps: 73728000.000; mean_bps: 73728000.000; fps: 29.967; mean_fps: 29.739; cpu: 13;
Information when using watch -n 1 cat /sys/class/kgsl/kgsl-3d0/gpubusy:
4610 1000644
CPU baseline benchmark
Run this pipeline:
MODEL=/etc/models/yolox_quantized.tflite
LABELS=/etc/labels/yolox.json
GST_DEBUG=2 gst-launch-1.0 -e -v \
qtiqmmfsrc camera=1 ! \
'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! \
qtimlvconverter ! \
qtimltflite \
model=$MODEL \
delegate=none \
threads=4 ! \
qtimlpostprocess \
module=yolov8 \
labels=$LABELS \
results=10 \
settings='{"confidence": 40.0}' ! \
fpsdisplaysink video-sink=fakesink text-overlay=false sync=false
This results in the following log after 10 seconds:
... /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 25, dropped: 0, current: 3.11, average: 3.23 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 27, dropped: 0, current: 3.10, average: 3.22 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 29, dropped: 0, current: 3.11, average: 3.21 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 31, dropped: 0, current: 3.11, average: 3.20 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 33, dropped: 0, current: 3.11, average: 3.20
CPU using top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4986 ubuntu 20 0 784476 151296 96640 S 98.3 0.4 0:06.89 gst-launch-1.0 Information when adding perf element before fpsdisplaysink:
perf: perf0; timestamp: 0:12:15.400971224; bps: 7372800.000; mean_bps: 8192000.000; fps: 3.157; mean_fps: 3.158; cpu: 15;
Information when using watch -n 1 cat /sys/class/kgsl/kgsl-3d0/gpubusy:
0 0
GPU baseline benchmark
Run this pipeline
gst-launch-1.0 -e -v qtiqmmfsrc camera=1 ! 'video/x-raw,format=NV12,width=1280,height=720,framerate=30/1' ! qtimlvconverter ! qtimltflite model=$MODEL delegate=gpu ! qtimlpostprocess module=yolov8 labels=$LABELS results=10 settings='{"confidence": 40.0}' ! fpsdisplaysink video-sink=fakesink text-overlay=false sync=false
Got this result after 10 seconds:
... /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 177, dropped: 0, current: 20.01, average: 20.43 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 188, dropped: 0, current: 20.41, average: 20.43 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 199, dropped: 0, current: 20.55, average: 20.44 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 210, dropped: 0, current: 20.63, average: 20.45 /GstPipeline:pipeline0/GstFPSDisplaySink:fpsdisplaysink0: last-message = rendered: 221, dropped: 0, current: 20.22, average: 20.44
CPU using top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6122 ubuntu 20 0 1000168 241008 163632 S 20.9 0.7 0:04.69 gst-launch-1.0
Information when adding perf element before fpsdisplaysink:
perf: perf0; timestamp: 0:10:20.605190113; bps: 51609600.000; mean_bps: 49971200.000; fps: 20.358; mean_fps: 20.372; cpu: 13;
Information when using watch -n 1 cat /sys/class/kgsl/kgsl-3d0/gpubusy:
874231 1007485
Summary of results
Tested on: Ubuntu 24.04
| Resolution | Backend | CPU use (%) | GPU use (%) | Frames rendered after 10 s | Average FPS |
|---|---|---|---|---|---|
| 360p | HTP | 7.9 | 0.38 | 296 | 30.02 |
| CPU | 98.7 | 0 | 47 | 3.18 | |
| GPU | 20.9 | 88.86 | 221 | 20.51 | |
| 720p | HTP | 7.9 | 0.46 | 295 | 30.02 |
| CPU | 98.3 | 0 | 33 | 3.20 | |
| GPU | 20.9 | 86.77 | 221 | 20.44 | |
| 1080p | HTP | 7.6 | 0.49 | 298 | 30.02 |
| CPU | 98.7 | 0 | 33 | 3.26 | |
| GPU | 21.3 | 87.66 | 209 | 20.39 |
/sys/class/kgsl/kgsl-3d0/gpubusy file with the formula: GPU Busy Raw / GPU Total Raw * 100- ↑ 1.0 1.1 1.2 1.3 Qualcomm, "IQ-9075," https://www.qualcomm.com/internet-of-things/products/iq9-series/iq-9075.
- ↑ 2.0 2.1 Qualcomm, "IQ-9075 Documentation," https://docs.qualcomm.com/doc/80-75286-1/topic/iq-9075-hw-docs-homepage.html.
- ↑ 3.0 3.1 Qualcomm, "Hardware overview - Qualcomm Dragonwing IQ-9075 Evaluation Kit," https://docs.qualcomm.com/doc/80-80020-261/topic/iq9-ug-hw-overview.html.
- ↑ 4.0 4.1 4.2 Qualcomm, "Device description - QCS9075 Data Sheet," https://docs.qualcomm.com/doc/80-73417-1/topic/device-description.html.
- ↑ 5.0 5.1 5.2 Qualcomm, "HTP - Qualcomm AI Runtime SDK," https://docs.qualcomm.com/doc/80-63442-10/topic/htp_backend.html.
- ↑ Qualcomm, "Supported Operations - Qualcomm AI Runtime SDK," https://docs.qualcomm.com/doc/80-63442-10/topic/SupportedOps.html.
- ↑ https://docs.qualcomm.com/doc/80-80021-261/topic/iq9-ug-run-sample-apps.html#procedure-ai